Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Squashing port-to-postgresql further #1199

Closed
wants to merge 34 commits into from
Closed

Squashing port-to-postgresql further #1199

wants to merge 34 commits into from

Conversation

brucebolt
Copy link
Member

@brucebolt brucebolt commented Dec 21, 2023

This branch was created from port-to-postgresql-squashed (PR #1196, a partially-cleaned version of #1085 ) yesterday, and squashed down to 39 commits by @brucebolt and @JonathanHallam. However, the commit messages still showed lots of artefacts from previous messy rebases - duplicated messages, or messages that no longer accurately described the changes.

I (@aldavidson) went through all the commits this today, editing & rewording messages to create a cleaner history, and I also found a few more commits that could be squashed together. It's now ready for re-review.

This application is owned by the publishing platform team. Please let us know in #govuk-publishing-platform when you raise any PRs.

⚠️ This repo is Continuously Deployed: make sure you follow the guidance ⚠️

Follow these steps if you are doing a Rails upgrade.

@brucebolt brucebolt force-pushed the test branch 3 times, most recently from 4f5d7cf to 8126a74 Compare December 21, 2023 16:11
@brucebolt brucebolt changed the title Testing Squashing port-to-postgresql further Dec 21, 2023
@aldavidson aldavidson force-pushed the test branch 5 times, most recently from 310fb24 to 38683f0 Compare December 22, 2023 13:55
@aldavidson aldavidson marked this pull request as ready for review December 22, 2023 14:02
@brucebolt brucebolt force-pushed the test branch 3 times, most recently from 1f8ea36 to 5932dfc Compare December 27, 2023 10:10
This removes the mongo configuration and migrates us to PostgreSQL.
This removes the MongoDB configuration and adds PostgreSQL.
These won't be needed after migration to PostgreSQL.
@brucebolt brucebolt force-pushed the test branch 2 times, most recently from 09c79c1 to 18e2bd2 Compare December 27, 2023 14:18
This adds migrations which construct the basics of a database in
PostgreSQL and dumps the schema.

Remove mongo_instrumentation and setup in tests.
aldavidson and others added 23 commits December 27, 2023 14:26
Added bonus - we can now use PostgreSQL's JSONB search abilities to
simplify the previously very-verbose `content_items(term)` query.
These can be simplified after migration to PostgreSQL.
Resolve issue with before/after comparison failing in
should_register_routes?
ContentItem.create_and_replace had some required behaviour that didn't
seem to be tested for - such as exactly what to do with an existing
item on the given base_path.

Made sure that this behaviour is maintented in this branch, and
added explicit specs to test this.
This will allow us to cross-reference PostgreSQL records to MongoDB
records post-migration if needed
Support import of doubly-nested mongo date fields
Add field mappings for ScheduledPublishingLogEntry and PublishIntent
Add mongo_id field to user & scheduled_publishing_log_entry
Add `rails_timestamp` method to remove conflicts with ActiveRecord
 behaviour when doing .insert with some-but-not-all values given
Add support for batch_size in JsonImporter
This will allow us to perform side-by-side performance comparisons of
the Mongo and Postgres content-stores on the same hardware (e.g.
local dev laptop) and prove that the PostgreSQL content-store is at
least as performant as the Mongo version.
This improves response times to around 30% of previous values
These will no longer be needed after migration to PostgreSQL.
Some records in the MongoDB have nil values in `created_at` or
`updated_at`. ActiveRecord's `timestamps` migration method by default
creates these fields without allowing nil values, so we must explicitly
add support for this after the fact.
Some records in the old MongoDb have `description` as a simple value,
some have it as a Hash. We need to support both, and make sure that we
only wrap the given value in a Hash if it's not already like that.
This fixes a bug where unpublished redirects in short URL manager aren't removed
from the content store (so continue to work on the website).

Users of the content-store API (i.e. publishing-api) might make API calls with
values they want to reset provided as `nil`. For example, if you wanted to clear
some redirects on a content item, you might do something like:

```
PUT /content/some-made-up-url

{
  ...
  "redirects": nil
  ...
}
```

The intent of the user of the API is clear here - they want no redirects.

However, ContentItem has a default value for redirects:

```
  field :redirects, type: Array, default: []
```

And the rest of the content-store expects this value to be an Array, not to be
nil.

By passing in potentially nil values in  assign_attributes we allow a situation
where fields that content-store expects not to be nil (because they have
defaults), can be nil. This tends to result in NoMethodErrors, such as this one:

```
	NoMethodError
undefined method `map' for nil:NilClass

    redirects = item.redirects.map(&:to_h).map(&:deep_symbolize_keys)
                              ^^^^
/app/app/models/route_set.rb:30:in `from_content_item'
/app/app/models/content_item.rb:215:in `route_set'
/app/app/models/content_item.rb:225:in `should_register_routes?'
/app/app/models/content_item.rb:193:in `register_routes'
/app/app/models/content_item.rb:33:in `create_or_replace'
/app/app/controllers/content_items_controller.rb:32:in `block in update'
```

I can't think of any valid reason for overriding default attributes with nils,
so it feels like calling .compact is the right thing to do here.
The `.any?` method, when called on a Relation, seems to instantiate
 the objects in the resultset, which is very slow on Kubernetes. It
 worked fine in Mongoid, but not in ActiveRecord.

If we replace this with `.count.positive?`, it's much faster.
Whitehall doesn't do much validation of a given scheduled publishing
time. As a result, it can sometimes send us really extreme values for
`scheduled_publishing_delay_seconds` (e.g. 400 years into the future).

This can cause problems in the importer when Mongo has accepted the
value, but PostgreSQL can't. Changing the field type to `bigint`
fixes the issue.
...it will fail due to the read-only filesystem in prod.

Also, some whitespace-only corrections to the migration
It turns out that you can't call `LogStasher.add_custom_fields` twice.
Because Content Store's custom fields config was being run after the
govuk_app_config gem's own custom fields config, the gem's config was
being overwritten. So, fields like `govuk_request_id` and `varnish_id`
weren't appearing in Content Store's controller request logs (but
`govuk_dependency_resolution_source_content_id` was).

The gem (version 9.7.0) now provides a mechanism for setting custom
fields that doesn't overwrite the gem's own settings
https://docs.publishing.service.gov.uk/repos/govuk_app_config.html#logger-configuration:

```
GovukJsonLogging.configure do
  add_custom_fields do |fields|
    fields[:govuk_custom_field] = request.headers["GOVUK-Custom-Header"]
  end
end
```
An earlier commit on main (#d5422b46 in PR #1136) fixed a subtle
issue when overriding default values with nil, by explicitly setting
`.created_at` and other attributes from the existing item when it was
being replaced. This caused issues on this PostgreSQL branch after
rebasing, as ActiveRecord behaves more as expected with respect to
`created_at` and therefore the line creating a local `created_at`
variable had been removed.

This commit reintroduces that variable, and tests now pass again.
Copy link
Member Author

@brucebolt brucebolt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks great to me! A heroic effort migrating to PostgreSQL.

I've run rubocop after each commit, to reduce the number of diffs in later commits and rebased against main.

I can't approve the PR as I opened it. So my suggestion would be to push the test branch to a better named branch, then open another PR (titled "Port application to PostgreSQL", or similar), which I can approve.

Copy link
Contributor

@richardTowers richardTowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a read through this commit-by-commit (thanks for cleaning them up), and it all looks good to me.

I had one extremely minor nitpick comment about a comment, but other than that I'm happy for this to merge. Approving so you can choosed to merge this PR if you like, but if you want to open another PR with a better branch name that's fine too - let me know and I'll approve that too.

@@ -59,12 +63,10 @@ def self.find_by_path(path)
::FindByPath.new(self).find(path)
end

# We want to force the JSON representation to use "base_path" instead of
# "_id" to prevent "_id" being exposed outside of the model.
# We want to force the JSON representation to use "base_path"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 'We want to force the JSON representation to use "base_path"' still accurate? We've removed the confusing hash["base_path"] = hash.delete("_id") bit now.

@aldavidson
Copy link
Contributor

aldavidson commented Jan 2, 2024

Note for the record: we've decided the best way to resolve the situation (production being built from port-to-postgresql, but this branch test being the cleaner and rebased one) is as follows:

  1. force-push test to origin/port-to-postgresql (PR Port to PostgreSQL #1085)
  2. merge Port to PostgreSQL #1085 into main
  3. close this PR un-merged

This will allow us to maintain all the existing links & references to #1085 in Github comments / commit messages / documents on Google drive / Trello cards, etc as the primary source of truth and history.

@aldavidson
Copy link
Contributor

I've force-pushed this branch to port-to-postgresql (#1085) - so I'll close this unmerged. All tests have passed on #1085, so as soon as there's an approval on that, I can merge it

@aldavidson aldavidson closed this Jan 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants